{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79a901c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"<OPENAI_API_KEY>\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d641bc4b-1d0f-4597-a2f3-68c95bd3c233",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings import NomicEmbedding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1436a10d-d462-473f-822f-331fe6141742",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
    "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
    "\n",
    "from llama_index import (\n",
    "    VectorStoreIndex,\n",
    "    SimpleDirectoryReader,\n",
    "    ServiceContext,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f05e3c9f-ba59-4046-a8a4-75bff12babf2",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "04b236ab-79fa-40c1-8956-145d1523fae1",
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd6e90b8-7246-4959-ac7f-5f2f6afd46c1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "3944c255ffc64b9d843c139f4092606a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9d6348a8853e400a9ebd284edac4dde9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Generating embeddings:   0%|          | 0/22 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "nomic_api_key = \"<NOMIC_API_KEY>\"\n",
    "embed_model = NomicEmbedding(\n",
    "    api_key=nomic_api_key,\n",
    "    model_name=\"nomic-embed-text-v1\",\n",
    "    query_task_type=\"search_query\",\n",
    "    document_task_type=\"search_document\",\n",
    ")\n",
    "service_context = ServiceContext.from_defaults(\n",
    "    embed_model=embed_model,\n",
    "    chunk_size=1024,\n",
    ")\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents=documents, service_context=service_context, show_progress=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55a59021",
   "metadata": {},
   "outputs": [],
   "source": [
    "search_query_retriever = index.as_retriever(\n",
    "    service_context=service_context, similarity_top_k=2\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97ece31b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.response.notebook_utils import display_source_node\n",
    "\n",
    "retriever_nomic = search_query_retriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3318ac06",
   "metadata": {},
   "outputs": [],
   "source": [
    "retrieved_nodes_nomic = retriever_nomic.retrieve(\n",
    "    \"What software did Paul write?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7bf2583",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 9058e2a4-6a56-4896-898d-eebd7166ca1b<br>**Similarity:** 0.6107600427118974<br>**Text:** Meanwhile I'd been hearing more and more about this new thing called the World Wide Web. Robert Morris showed it to me when I visited him in Cambridge, where he was now in grad school at Harvard. It seemed to me that the web would be a big deal. I'd seen what graphical user interfaces had done for the popularity of microcomputers. It seemed like the web would do the same for the internet.\n",
       "\n",
       "If I wanted to get rich, here was the next train leaving the station. I was right about that part. What I got wrong was the idea. I decided we should start a company to put art galleries online. I can't honestly say, after reading so many Y Combinator applications, that this was the worst startup idea ever, but it was up there. Art galleries didn't want to be online, and still don't, not the fancy ones. That's not how they sell. I wrote some software to generate web sites for galleries, and Robert wrote some to resize images and set up an http server to serve the pages. Then we tried to sign up galleries. To call this a difficult sale would be an understatement. It was difficult to give away. A few galleries let us make sites for them for free, but none paid us.\n",
       "\n",
       "Then some online stores started to appear, and I realized that except for the order buttons they were identical to the sites we'd been generating for galleries. This impressive-sounding thing called an \"internet storefront\" was something we already knew how to build.\n",
       "\n",
       "So in the summer of 1995, after I submitted the camera-ready copy of ANSI Common Lisp to the publishers, we started trying to write software to build online stores. At first this was going to be normal desktop software, which in those days meant Windows software. That was an alarming prospect, because neither of us knew how to write Windows software or wanted to learn. We lived in the Unix world. But we decided we'd at least try writing a prototype store builder on Unix. Robert wrote a shopping cart, and I wrote a new site generator for stores — in Lisp, ...<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** cc1c29ab-07da-4848-b6a2-1923cfdc7b75<br>**Similarity:** 0.6032412533629631<br>**Text:** What I Worked On\n",
       "\n",
       "February 2021\n",
       "\n",
       "Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n",
       "\n",
       "The first programs I tried writing were on the IBM 1401 that our school district used for what was then called \"data processing.\" This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.\n",
       "\n",
       "The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it. The result would ordinarily be to print something on the spectacularly loud printer.\n",
       "\n",
       "I was puzzled by the 1401. I couldn't figure out what to do with it. And in retrospect there's not much I could have done with it. The only form of input to programs was data stored on punched cards, and I didn't have any data stored on punched cards. The only other option was to do things that didn't rely on any input, like calculate approximations of pi, but I didn't know enough math to do anything interesting of that type. So I'm not surprised I can't remember any programs I wrote, because they can't have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn't. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager's expression made clear.\n",
       "\n",
       "With microcomputers, everything changed. Now you could h...<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "for n in retrieved_nodes_nomic:\n",
    "    display_source_node(n, source_length=2000)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
